googologytestingfandomcom-20200215-history
User blog:Edwin Shade 2/Reverse Engineering .XML Files
Recently (and by recently I mean today), I took note of a special page that appeared in , namely, the and pages. These have allowed me to completely* duplicate the main wiki's mainspace in a time span I would have never thought possible! *It may seem as if there are over 2,000 pages missing, but I took the page names from GWiki's Special:ShortPages page, which exempts redirects. There are over 2,000 redirects on the wikia which explains the omission of over 2,000 "pages" in the complete exportation. But as with all things, I'm not content leaving it as a process that is taken care of by the computer alone. I've discovered that one can open these .xml files using any standard .txt editor, and what is more, is that .xml is so simple that even I'' can understand it! (Which is saying something because I'm not too big on programming, despite planning to participate in the current codegolf contest on Scratch.) It is with this curiosity therefore that I've made this blog post, to document and maybe even educate about the nature of these .xml files. How To Import/Export Files To export files from a wikia, go to the page on the given wikia of your choice. If you try to go to on a wikia that you have ''not been given admin rights on, you will not be allowed to import files. Anyone can export files however. Now when you are at the page, this is what you should see: To make things easier, you can choose to export all pages that belong to a certain Category, or write the page titles in the large text field given - one line per title - one title per line, simple. Of course, I did not manually write every page title from the main wikia into the text field, so how did I do it? Easy. I went to *, CTRL+A'd (select all), CTRL+C'd (copied all that was selected), then CTRL+V'd (pasted all that which was copied) the information into sortmylist.com. From here, your list might look like this: *One may wonder why I selected titles from the page instead of . Well, the page is ordered by byte count, meaning you can estimate how much data pages within a certain range will take up, and therefore avoid downloading files that are too large to be imported (>10 MB). In addition, lists the pages in columns of three, which means you will have to manually press enter for many titles just to fit one title per line. The page lists out titles in just one column, which eliminates this hassle. Now go to Replace, and you should get a box pop-up like this: Click either "him" or "her" and you will find it is a text field. Replace "him" with "(hist)" and blank the text field that formally had "her", so that you are left with this: Now hit the "Replace" button above the text fields within the pop-up box. You should then see that all instances of the phrase "(hist)" have been replaced with " " (or a space) in the list. Unless a page has the phrase "(hist)" within the title (which is bound to be very rare, and such pages are in fact nonexistent on the Googology Wikia), this will eliminate the unneeded phrase. The next step is to go once more to the upper toolbar and go to Clean>Prune Items, like this: Clicking it brings up another pop-up box: The colon in the slightly blue text field can be replaced with a "which will omit all the "[N bytes" phrases at the end of the list of titles. So insert "[" in the textfield, then hit the right button which reads "Prune after ':'". After this you may x out of both boxes and admire your cleansed list of titles: Now just CTRL+A, CRTL+C, then CTRL+V into the large text field box on the page on the wikia from which you got the titles from. I recommend that you select the "Include only the current revision, not the full history" checkbox before exporting, since the aggregate byte total of a page and all it's revisions is roughly proportionate upon the square of that page's initial byte-count, meaning you will get files in the megabytes for pages that would otherwise only take up a few kilobytes. FANDOM has a 10 MB upload restriction (unlike Wikipedia), so you will not be able to import these .xml files. I also recommend you select the "Include templates" checkbox, since it's easier than going back to either omit all uses of the template, or going to the list of templates on a wiki and then exporting then importing them! The "Save as file" checkbox just means you'll download a .xml file when you click the green "Export" button. If you do not select this textbox, you will be led to a page unto which you can easily view the .xml code. This is how I came to understand it. Anyways, should you choose to download a file, you'll next want to go to the page on a wikia on which you are allowed to do such a thing, and click Choose File: After choosing the .xml file you'll see you can make a comment. This will appear beside every import entry in the import log within . I personally don't see how useful this is, but do what you wish with it. Lastly, click the gray button that reads "Upload file" and the process will begin. Depending on the number of pages it could take a few seconds to a few minutes. However, the process is overall very speedy and you shouldn't have problems with it. I personally noticed that when trying to upload a .xml file of 2,000 pages, the upload froze about halfway through, which leads me to believe the limit may be 1,000 pages per batch. How To Reverse Engineer .XML Files soon Category:Blog posts